首页> 外文OA文献 >Transfer Learning for Video Recognition with Scarce Training Data for Deep Convolutional Neural Network
【2h】

Transfer Learning for Video Recognition with Scarce Training Data for Deep Convolutional Neural Network

机译:利用稀缺训练数据进行视频识别的转移学习   深度卷积神经网络

摘要

Unconstrained video recognition and Deep Convolution Network (DCN) are twoactive topics in computer vision recently. In this work, we apply DCNs asframe-based recognizers for video recognition. Our preliminary studies,however, show that video corpora with complete ground truth are usually notlarge and diverse enough to learn a robust model. The networks trained directlyon the video data set suffer from significant overfitting and have poorrecognition rate on the test set. The same lack-of-training-sample problemlimits the usage of deep models on a wide range of computer vision problemswhere obtaining training data are difficult. To overcome the problem, weperform transfer learning from images to videos to utilize the knowledge in theweakly labeled image corpus for video recognition. The image corpus help tolearn important visual patterns for natural images, while these patterns areignored by models trained only on the video corpus. Therefore, the resultantnetworks have better generalizability and better recognition rate. We show thatby means of transfer learning from image to video, we can learn a frame-basedrecognizer with only 4k videos. Because the image corpus is weakly labeled, theentire learning process requires only 4k annotated instances, which is far lessthan the million scale image data sets required by previous works. The sameapproach may be applied to other visual recognition tasks where only scarcetraining data is available, and it improves the applicability of DCNs invarious computer vision problems. Our experiments also reveal the correlationbetween meta-parameters and the performance of DCNs, given the properties ofthe target problem and data. These results lead to a heuristic formeta-parameter selection for future researches, which does not rely on the timeconsuming meta-parameter search.
机译:不受约束的视频识别和深度卷积网络(DCN)是计算机视觉中的两个活跃主题。在这项工作中,我们将DCN作为基于帧的识别器应用于视频识别。但是,我们的初步研究表明,具有完整基础事实的视频语料库通常规模不大且不够多样化,无法学习可靠的模型。直接在视频数据集上训练的网络存在严重的过拟合现象,并且在测试集上的识别率很低。相同的训练样本不足问题限制了深度模型在难以获得训练数据的各种计算机视觉问题上的使用。为了克服这个问题,我们进行从图像到视频的学习转移,以利用弱标签图像语料库中的知识进行视频识别。图像语料库有助于学习自然图像的重要视觉模式,而仅在视频语料库上训练的模型会忽略这些模式。因此,所得网络具有更好的通用性和更好的识别率。我们表明,通过从图像到视频的转移学习,我们可以仅使用4k视频学习基于帧的识别器。由于图像语料库的标记较弱,因此整个学习过程仅需要4k个带注释的实例,这远远小于以前工作所需的百万级图像数据集。相同的方法可以应用于仅缺乏培训数据的其他视觉识别任务,并且可以提高DCN各种计算机视觉问题的适用性。考虑到目标问题和数据的属性,我们的实验还揭示了元参数与DCN性能之间的相关性。这些结果导致了启发式的形式参数选择,以供将来的研究之用,它不依赖费时的元参数搜索。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号